Take Home Exercise 3

Author

Felicia Eng

Published

March 1, 2024

Modified

March 23, 2024

Take-home Exercise 3 will be similar to one of the prototype module prepared above in term of content but with the following differences:

In this take-home exercise, you are required to select one of the module of your proposed Geospatial Analytics Shiny Application and complete the following tasks:

Required R packages for First and Second Order Spatial Point Patterns

  1. tidyverse:

    • Purpose: A collection of packages for data manipulation and visualization, emphasizing a consistent and tidy data format.

    • Relevance to spatial order: Useful for cleaning and transforming spatial data into a structured format that facilitates analysis.

  2. knitr:

    • Purpose: Dynamic report generation in R, allowing integration of R code and results into documents.

    • Relevance to spatial order: Enables the creation of documents that include spatial analysis results, making it easier to communicate and share findings.

  3. rgdal:

    • Purpose: Provides bindings to the GDAL (Geospatial Data Abstraction Library) for reading and writing geospatial data formats.

    • Relevance to spatial order: Essential for handling spatial data in various formats, ensuring interoperability and compatibility.

  4. maptools:

    • Purpose: Tools for reading and handling spatial objects, particularly shapefiles.

    • Relevance to spatial order: Facilitates the manipulation and analysis of spatial data, especially when dealing with shapefiles.

  5. sf:

    • Purpose: Supports simple features for spatial data representation and manipulation.

    • Relevance to spatial order: Offers a modern and efficient way to handle spatial data, including points, lines, and polygons.

  6. raster:

    • Purpose: Deals with gridded spatial data, such as satellite imagery or climate data.

    • Relevance to spatial order: Useful for working with raster data, which involves spatial information arranged in a grid.

  7. spatstat:

    • Purpose: Analyzing spatial point patterns and processes.

    • Relevance to spatial order: Particularly focused on the first point spatial order, addressing the distribution and arrangement of individual points in space.

  8. tmap:

    • Purpose: Creates thematic maps for visualizing spatial data.

    • Relevance to spatial order: Helps in visually exploring and interpreting spatial patterns in data.

  9. tmaptools:

    • Purpose: Provides additional tools for working with thematic maps.

    • Relevance to spatial order: Complements tmap by offering additional functionalities for spatial data visualization.

  10. gridExtra:

    • Purpose: Extends the grid graphics system to arrange multiple grid-based figures on one page.

    • Relevance to spatial order: Useful for creating complex layouts when visualizing multiple spatial plots or maps.

  11. leaflet:

    • Purpose: Creates interactive web maps using JavaScript.

    • Relevance to spatial order: Enables the development of interactive maps, enhancing the exploration of spatial relationships and patterns.

  12. OpenStreetMap:

    • Purpose: Accesses and interacts with OpenStreetMap data.

    • Relevance to spatial order: Allows integration of OpenStreetMap data into spatial analysis, providing additional context to spatial patterns.

  13. ggstatsplot:

    • Purpose: Enhances ggplot2 with statistical summaries and plots.

    • Relevance to spatial order: Can be used for adding statistical summaries to spatial visualizations created with ggplot2.

  14. statsExpressions:

    • Purpose: Creates customizable expressions for statistical summaries.

    • Relevance to spatial order: Useful for customizing the presentation of statistical information in spatial analysis results.

  15. lubridate:

    • Purpose: Facilitates the manipulation of date-time objects.

    • Relevance to spatial order: Helps in handling temporal aspects of spatial data, which is crucial in understanding how spatial patterns change over time.

pacman::p_load(tidyverse, knitr, rgdal, maptools, sf,raster,spatstat, tmap,tmaptools, gridExtra, leaflet, OpenStreetMap, ggstatsplot, statsExpressions, lubridate, raster, gridExtra, skimr)

Data that is available from Airbnb

InsideAirbnb provides a snapshot of the following information:

  • Listings - Summary information on listings

  • Detailed Listings - Detailed listing information of airbnb for rent

  • Calendar - Detailed calendar data for listings

  • Reviews - Summary review data

  • Detailed Reviews - Detailed review data for listings

  • Neighbourhoods - list of neighbourhoods in the city and a neighbourhood GeoJSON file

Readind data from Airbnb

listings <- read_csv("data/listings.csv")
d_listings <- read_csv("data/detailedlistings.csv")
calendar <- read_csv("data/calendar.csv")
reviews <- read_csv("data/reviews.csv")
d_reviews <- read_csv("data/detailedreviews.csv")
neighbourhoods <- read_csv("data/neighbourhoods.csv")

Reading Spatial data from Airbnb

nhood_map_sf <- st_read(dsn = "data/neighbourhoods.geojson", 
                        layer="neighbourhoods") %>%
                st_transform(crs = 3414)
Reading layer `neighbourhoods' from data source 
  `C:\Feliciaeng29\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\neighbourhoods.geojson' 
  using driver `GeoJSON'
Simple feature collection with 55 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6054 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Data Cleaning

glimpse(listings)
Rows: 3,457
Columns: 18
$ id                             <dbl> 71609, 71896, 71903, 275343, 275344, 28…
$ name                           <chr> "Villa in Singapore · ★4.44 · 2 bedroom…
$ host_id                        <dbl> 367042, 367042, 367042, 1439258, 143925…
$ host_name                      <chr> "Belinda", "Belinda", "Belinda", "Kay",…
$ neighbourhood_group            <chr> "East Region", "East Region", "East Reg…
$ neighbourhood                  <chr> "Tampines", "Tampines", "Tampines", "Bu…
$ latitude                       <dbl> 1.34537, 1.34754, 1.34531, 1.29015, 1.2…
$ longitude                      <dbl> 103.9589, 103.9596, 103.9610, 103.8081,…
$ room_type                      <chr> "Private room", "Private room", "Privat…
$ price                          <dbl> 150, 80, 80, 64, 78, 220, 85, 75, 69, 7…
$ minimum_nights                 <dbl> 92, 92, 92, 60, 60, 92, 92, 60, 60, 92,…
$ number_of_reviews              <dbl> 19, 24, 46, 20, 16, 12, 131, 17, 5, 81,…
$ last_review                    <date> 2020-01-17, 2019-10-13, 2020-01-09, 20…
$ reviews_per_month              <dbl> 0.13, 0.16, 0.30, 0.15, 0.11, 0.09, 0.9…
$ calculated_host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51, 51, 7, 7, 1,…
$ availability_365               <dbl> 55, 91, 91, 183, 183, 54, 365, 183, 183…
$ number_of_reviews_ltm          <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2, 0, 0, 0, 0, …
$ license                        <chr> NA, NA, NA, "S0399", "S0399", NA, NA, "…
glimpse(reviews)
Rows: 36,905
Columns: 2
$ listing_id <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609, 716…
$ date       <date> 2011-12-19, 2012-07-17, 2012-09-01, 2012-09-04, 2013-01-02…
glimpse(d_reviews)
Rows: 36,905
Columns: 6
$ listing_id    <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609, …
$ id            <dbl> 793880, 1731810, 2162194, 2190615, 3221837, 10071818, 15…
$ date          <date> 2011-12-19, 2012-07-17, 2012-09-01, 2012-09-04, 2013-01…
$ reviewer_id   <dbl> 1456140, 1804182, 3113461, 1432123, 2759938, 11319720, 1…
$ reviewer_name <chr> "Max", "Zac", "Zahra", "Helmut", "Jack", "Emily", "Steve…
$ comments      <chr> "The rooms were clean and tidy. Beds very comfortable.\r…
glimpse(calendar)
Rows: 1,261,786
Columns: 7
$ listing_id     <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609,…
$ date           <date> 2023-12-27, 2023-12-28, 2023-12-29, 2023-12-30, 2023-1…
$ available      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ price          <chr> "$108.00", "$108.00", "$108.00", "$108.00", "$108.00", …
$ adjusted_price <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ minimum_nights <dbl> 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92,…
$ maximum_nights <dbl> 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1…
glimpse(d_listings)
Rows: 3,457
Columns: 75
$ id                                           <dbl> 71609, 71896, 71903, 2753…
$ listing_url                                  <chr> "https://www.airbnb.com/r…
$ scrape_id                                    <dbl> 2.023123e+13, 2.023123e+1…
$ last_scraped                                 <date> 2023-12-27, 2023-12-26, …
$ source                                       <chr> "previous scrape", "city …
$ name                                         <chr> "Villa in Singapore · ★4.…
$ description                                  <lgl> NA, NA, NA, NA, NA, NA, N…
$ neighborhood_overview                        <chr> NA, NA, "Quiet and view o…
$ picture_url                                  <chr> "https://a0.muscache.com/…
$ host_id                                      <dbl> 367042, 367042, 367042, 1…
$ host_url                                     <chr> "https://www.airbnb.com/u…
$ host_name                                    <chr> "Belinda", "Belinda", "Be…
$ host_since                                   <date> 2011-01-29, 2011-01-29, …
$ host_location                                <chr> "Singapore", "Singapore",…
$ host_about                                   <chr> "Hi My name is Belinda -H…
$ host_response_time                           <chr> "N/A", "N/A", "N/A", "wit…
$ host_response_rate                           <chr> "N/A", "N/A", "N/A", "100…
$ host_acceptance_rate                         <chr> "100%", "100%", "100%", "…
$ host_is_superhost                            <lgl> FALSE, FALSE, FALSE, FALS…
$ host_thumbnail_url                           <chr> "https://a0.muscache.com/…
$ host_picture_url                             <chr> "https://a0.muscache.com/…
$ host_neighbourhood                           <chr> "Tampines", "Tampines", "…
$ host_listings_count                          <dbl> 5, 5, 5, 51, 51, 5, 7, 51…
$ host_total_listings_count                    <dbl> 15, 15, 15, 68, 68, 15, 8…
$ host_verifications                           <chr> "['email', 'phone']", "['…
$ host_has_profile_pic                         <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ host_identity_verified                       <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ neighbourhood                                <chr> NA, NA, "Singapore, Singa…
$ neighbourhood_cleansed                       <chr> "Tampines", "Tampines", "…
$ neighbourhood_group_cleansed                 <chr> "East Region", "East Regi…
$ latitude                                     <dbl> 1.34537, 1.34754, 1.34531…
$ longitude                                    <dbl> 103.9589, 103.9596, 103.9…
$ property_type                                <chr> "Private room in villa", …
$ room_type                                    <chr> "Private room", "Private …
$ accommodates                                 <dbl> 3, 1, 2, 1, 1, 4, 2, 1, 1…
$ bathrooms                                    <lgl> NA, NA, NA, NA, NA, NA, N…
$ bathrooms_text                               <chr> "1 private bath", "Shared…
$ bedrooms                                     <lgl> NA, NA, NA, NA, NA, NA, N…
$ beds                                         <dbl> 3, 1, 2, 1, 1, 5, 1, 1, 1…
$ amenities                                    <chr> "[]", "[]", "[]", "[]", "…
$ price                                        <chr> "$150.00", "$80.00", "$80…
$ minimum_nights                               <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_nights                               <dbl> 365, 365, 365, 999, 999, …
$ minimum_minimum_nights                       <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_minimum_nights                       <dbl> 92, 92, 92, 60, 60, 92, 9…
$ minimum_maximum_nights                       <dbl> 1125, 1125, 1125, 1125, 1…
$ maximum_maximum_nights                       <dbl> 1125, 1125, 1125, 1125, 1…
$ minimum_nights_avg_ntm                       <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_nights_avg_ntm                       <dbl> 1125, 1125, 1125, 1125, 1…
$ calendar_updated                             <lgl> NA, NA, NA, NA, NA, NA, N…
$ has_availability                             <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ availability_30                              <dbl> 30, 30, 30, 6, 6, 29, 30,…
$ availability_60                              <dbl> 34, 60, 60, 6, 6, 33, 60,…
$ availability_90                              <dbl> 55, 90, 90, 6, 6, 54, 90,…
$ availability_365                             <dbl> 55, 91, 91, 183, 183, 54,…
$ calendar_last_scraped                        <date> 2023-12-27, 2023-12-26, …
$ number_of_reviews                            <dbl> 19, 24, 46, 20, 16, 12, 1…
$ number_of_reviews_ltm                        <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2…
$ number_of_reviews_l30d                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ first_review                                 <date> 2011-12-19, 2011-07-30, …
$ last_review                                  <date> 2020-01-17, 2019-10-13, …
$ review_scores_rating                         <dbl> 4.44, 4.16, 4.41, 4.40, 4…
$ review_scores_accuracy                       <dbl> 4.37, 4.22, 4.39, 4.16, 4…
$ review_scores_cleanliness                    <dbl> 4.00, 4.09, 4.52, 4.26, 4…
$ review_scores_checkin                        <dbl> 4.63, 4.43, 4.63, 4.47, 4…
$ review_scores_communication                  <dbl> 4.78, 4.43, 4.64, 4.42, 4…
$ review_scores_location                       <dbl> 4.26, 4.17, 4.50, 4.53, 4…
$ review_scores_value                          <dbl> 4.32, 4.04, 4.36, 4.63, 4…
$ license                                      <chr> NA, NA, NA, "S0399", "S03…
$ instant_bookable                             <lgl> FALSE, FALSE, FALSE, TRUE…
$ calculated_host_listings_count               <dbl> 5, 5, 5, 51, 51, 5, 7, 51…
$ calculated_host_listings_count_entire_homes  <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0…
$ calculated_host_listings_count_private_rooms <dbl> 5, 5, 5, 51, 51, 5, 6, 51…
$ calculated_host_listings_count_shared_rooms  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ reviews_per_month                            <dbl> 0.13, 0.16, 0.30, 0.15, 0…

Change the data type of id to characters

listings <- listings %>% mutate_at(vars(id, host_id), as.character)
reviews <- reviews %>% mutate_at(vars(listing_id), as.character)
d_reviews <- d_reviews %>% mutate_at(vars(id, reviewer_id, listing_id), as.character)
calendar <- calendar %>% mutate_at(vars(listing_id), as.character)
d_listings <- d_listings %>% mutate_at(vars(host_id, id), as.character)

Change price in detailed listings to numerical & remove $ and, symbol in cloumns where currency is read as character.

strip_dollars = function(x) {as.numeric(gsub("[\\$,]", "", x))}
d_listings[,61:65] <- sapply(d_listings[,61:65], strip_dollars)
d_listings[,67] <- sapply(d_listings[,67], strip_dollars)

Check for Missing Data

We check if there is any missing data and decide how to handle it. Missing data will include zero value, which we need to change to be able to analyse correctly.

Listings data

skim(listings)
Data summary
Name listings
Number of rows 3457
Number of columns 18
_______________________
Column type frequency:
character 8
Date 1
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1.0 5 19 0 3457 0
name 0 1.0 31 81 0 1427 0
host_id 0 1.0 5 9 0 952 0
host_name 0 1.0 1 32 0 761 0
neighbourhood_group 0 1.0 11 17 0 5 0
neighbourhood 0 1.0 4 23 0 44 0
room_type 0 1.0 10 15 0 4 0
license 2070 0.4 5 29 0 112 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
last_review 1611 0.53 2014-06-28 2023-12-27 2022-05-07 999

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
latitude 0 1.00 1.31 0.03 1.22 1.29 1.30 1.32 1.46 ▁▇▂▁▁
longitude 0 1.00 103.84 0.04 103.63 103.83 103.85 103.86 103.99 ▁▁▇▇▁
price 128 0.96 270.36 528.21 13.00 80.00 175.00 289.00 10286.00 ▇▁▁▁▁
minimum_nights 0 1.00 65.81 67.21 1.00 6.00 92.00 92.00 1000.00 ▇▁▁▁▁
number_of_reviews 0 1.00 10.68 34.38 0.00 0.00 1.00 5.00 757.00 ▇▁▁▁▁
reviews_per_month 1611 0.53 0.59 1.34 0.01 0.05 0.18 0.63 29.62 ▇▁▁▁▁
calculated_host_listings_count 0 1.00 47.25 72.12 1.00 2.00 11.00 62.00 260.00 ▇▂▁▁▁
availability_365 0 1.00 248.32 139.21 0.00 150.00 329.00 364.00 365.00 ▃▁▁▂▇
number_of_reviews_ltm 0 1.00 2.39 12.20 0.00 0.00 0.00 0.00 396.00 ▇▁▁▁▁

Calender

skim(calendar)
Data summary
Name calendar
Number of rows 1261786
Number of columns 7
_______________________
Column type frequency:
character 2
Date 1
logical 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
listing_id 0 1 5 19 0 3457 0
price 0 1 6 10 0 483 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2023-12-26 2024-12-25 2024-06-26 366

Variable type: logical

skim_variable n_missing complete_rate mean count
available 0 1 0.69 TRU: 867321, FAL: 394465
adjusted_price 1261786 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
minimum_nights 3 1 72.93 81.61 1 6 92 92 1e+03 ▇▁▁▁▁
maximum_nights 3 1 880.19 1826.91 1 365 1125 1125 1e+05 ▇▁▁▁▁

Reviews and Detailed Reviews

skim(reviews)
Data summary
Name reviews
Number of rows 36905
Number of columns 2
_______________________
Column type frequency:
character 1
Date 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
listing_id 0 1 5 19 0 1846 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2011-05-04 2023-12-27 2019-12-16 3650
skim(d_reviews)
Data summary
Name d_reviews
Number of rows 36905
Number of columns 6
_______________________
Column type frequency:
character 5
Date 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
listing_id 0 1 5 19 0 1846 0
id 0 1 6 19 0 36905 0
reviewer_id 0 1 5 9 0 34123 0
reviewer_name 0 1 1 34 0 16055 0
comments 5 1 1 5472 0 35192 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2011-05-04 2023-12-27 2019-12-16 3650

Detailed Listings

skim(d_listings)
Data summary
Name d_listings
Number of rows 3457
Number of columns 75
_______________________
Column type frequency:
character 27
Date 4
logical 9
numeric 35
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1.00 5 19 0 3457 0
listing_url 0 1.00 34 48 0 3457 0
source 0 1.00 11 15 0 2 0
name 0 1.00 31 81 0 1427 0
neighborhood_overview 1214 0.65 3 1000 0 1011 0
picture_url 0 1.00 61 156 0 3253 0
host_id 0 1.00 5 9 0 952 0
host_url 0 1.00 39 43 0 952 0
host_name 0 1.00 1 32 0 761 0
host_location 1296 0.63 5 32 0 44 0
host_about 1206 0.65 1 2315 0 446 1
host_response_time 0 1.00 3 18 0 5 0
host_response_rate 0 1.00 2 4 0 36 0
host_acceptance_rate 0 1.00 2 4 0 52 0
host_thumbnail_url 0 1.00 55 131 0 917 0
host_picture_url 0 1.00 57 134 0 917 0
host_neighbourhood 224 0.94 4 18 0 59 0
host_verifications 0 1.00 2 32 0 7 0
neighbourhood 1214 0.65 9 39 0 36 0
neighbourhood_cleansed 0 1.00 4 23 0 44 0
neighbourhood_group_cleansed 0 1.00 11 17 0 5 0
property_type 0 1.00 4 34 0 51 0
room_type 0 1.00 10 15 0 4 0
bathrooms_text 19 0.99 6 17 0 39 0
amenities 0 1.00 2 2 0 1 0
price 128 0.96 6 10 0 626 0
license 2070 0.40 5 29 0 112 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
last_scraped 0 1.00 2023-12-26 2023-12-27 2023-12-27 2
host_since 0 1.00 2009-06-29 2023-12-19 2017-01-03 818
calendar_last_scraped 0 1.00 2023-12-26 2023-12-27 2023-12-27 2
first_review 1611 0.53 2011-05-04 2023-12-13 2019-12-28 1308

Variable type: logical

skim_variable n_missing complete_rate mean count
description 3457 0.00 NaN :
host_is_superhost 4 1.00 0.11 FAL: 3080, TRU: 373
host_has_profile_pic 0 1.00 0.98 TRU: 3399, FAL: 58
host_identity_verified 0 1.00 0.89 TRU: 3094, FAL: 363
bathrooms 3457 0.00 NaN :
bedrooms 3457 0.00 NaN :
calendar_updated 3457 0.00 NaN :
has_availability 128 0.96 0.99 TRU: 3296, FAL: 33
instant_bookable 0 1.00 0.37 FAL: 2182, TRU: 1275

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
scrape_id 0 1.00 2.023123e+13 0.00 2.023123e+13 2.023123e+13 2.023123e+13 2.023123e+13 2.023123e+13 ▁▁▇▁▁
host_listings_count 0 1.00 8.004000e+01 156.12 1.000000e+00 3.000000e+00 1.100000e+01 6.200000e+01 5.800000e+02 ▇▁▁▁▁
host_total_listings_count 0 1.00 1.299400e+02 243.48 1.000000e+00 5.000000e+00 1.700000e+01 9.600000e+01 8.750000e+02 ▇▁▁▁▁
latitude 0 1.00 1.310000e+00 0.03 1.220000e+00 1.290000e+00 1.300000e+00 1.320000e+00 1.460000e+00 ▁▇▂▁▁
longitude 0 1.00 1.038400e+02 0.04 1.036300e+02 1.038300e+02 1.038500e+02 1.038600e+02 1.039900e+02 ▁▁▇▇▁
accommodates 0 1.00 2.810000e+00 2.21 1.000000e+00 1.000000e+00 2.000000e+00 4.000000e+00 1.600000e+01 ▇▁▁▁▁
beds 82 0.98 1.800000e+00 2.08 1.000000e+00 1.000000e+00 1.000000e+00 2.000000e+00 4.600000e+01 ▇▁▁▁▁
minimum_nights 0 1.00 6.581000e+01 67.21 1.000000e+00 6.000000e+00 9.200000e+01 9.200000e+01 1.000000e+03 ▇▁▁▁▁
maximum_nights 0 1.00 7.816100e+02 1740.04 2.000000e+00 3.650000e+02 1.124000e+03 1.125000e+03 1.000000e+05 ▇▁▁▁▁
minimum_minimum_nights 0 1.00 6.576000e+01 67.53 1.000000e+00 6.000000e+00 9.200000e+01 9.200000e+01 1.000000e+03 ▇▁▁▁▁
maximum_minimum_nights 0 1.00 7.378000e+01 83.02 1.000000e+00 6.000000e+00 9.200000e+01 9.200000e+01 1.000000e+03 ▇▁▁▁▁
minimum_maximum_nights 0 1.00 8.779800e+02 1827.50 1.000000e+00 3.650000e+02 1.125000e+03 1.125000e+03 1.000000e+05 ▇▁▁▁▁
maximum_maximum_nights 0 1.00 8.883600e+02 1825.49 1.000000e+00 3.650000e+02 1.125000e+03 1.125000e+03 1.000000e+05 ▇▁▁▁▁
minimum_nights_avg_ntm 0 1.00 7.291000e+01 81.28 1.000000e+00 6.000000e+00 9.200000e+01 9.200000e+01 1.000000e+03 ▇▁▁▁▁
maximum_nights_avg_ntm 0 1.00 8.801600e+02 1827.00 1.000000e+00 3.650000e+02 1.125000e+03 1.125000e+03 1.000000e+05 ▇▁▁▁▁
availability_30 0 1.00 1.782000e+01 12.60 0.000000e+00 0.000000e+00 2.400000e+01 2.900000e+01 3.000000e+01 ▅▁▁▂▇
availability_60 0 1.00 3.788000e+01 24.75 0.000000e+00 6.000000e+00 5.300000e+01 5.900000e+01 6.000000e+01 ▃▁▁▁▇
availability_90 0 1.00 5.873000e+01 36.32 0.000000e+00 2.100000e+01 8.000000e+01 8.900000e+01 9.000000e+01 ▃▁▁▁▇
availability_365 0 1.00 2.483200e+02 139.21 0.000000e+00 1.500000e+02 3.290000e+02 3.640000e+02 3.650000e+02 ▃▁▁▂▇
number_of_reviews 0 1.00 1.068000e+01 34.38 0.000000e+00 0.000000e+00 1.000000e+00 5.000000e+00 7.570000e+02 ▇▁▁▁▁
number_of_reviews_ltm 0 1.00 2.390000e+00 12.20 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 3.960000e+02 ▇▁▁▁▁
number_of_reviews_l30d 0 1.00 1.700000e-01 1.08 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 3.800000e+01 ▇▁▁▁▁
last_review 3457 0.00 NaN NA NA NA NA NA NA
review_scores_rating 1611 0.53 4.540000e+00 0.62 1.000000e+00 4.360000e+00 4.710000e+00 5.000000e+00 5.000000e+00 ▁▁▁▂▇
review_scores_accuracy 1613 0.53 4.580000e+00 0.62 1.000000e+00 4.450000e+00 4.780000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_cleanliness 1613 0.53 4.500000e+00 0.64 1.000000e+00 4.280000e+00 4.670000e+00 5.000000e+00 5.000000e+00 ▁▁▁▂▇
review_scores_checkin 1613 0.53 4.730000e+00 0.50 1.000000e+00 4.670000e+00 4.910000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_communication 1612 0.53 4.710000e+00 0.56 1.000000e+00 4.670000e+00 4.910000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_location 1613 0.53 4.680000e+00 0.48 1.000000e+00 4.560000e+00 4.830000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_value 1613 0.53 4.440000e+00 0.64 1.000000e+00 4.230000e+00 4.590000e+00 4.940000e+00 5.000000e+00 ▁▁▁▂▇
calculated_host_listings_count 0 1.00 4.725000e+01 72.12 1.000000e+00 2.000000e+00 1.100000e+01 6.200000e+01 2.600000e+02 ▇▂▁▁▁
calculated_host_listings_count_entire_homes 0 1.00 3.720000e+01 73.83 0.000000e+00 0.000000e+00 1.000000e+00 2.700000e+01 2.600000e+02 ▇▁▁▁▁
calculated_host_listings_count_private_rooms 0 1.00 9.340000e+00 20.08 0.000000e+00 0.000000e+00 1.000000e+00 6.000000e+00 9.100000e+01 ▇▁▁▁▁
calculated_host_listings_count_shared_rooms 0 1.00 3.700000e-01 1.93 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.800000e+01 ▇▁▁▁▁
reviews_per_month 1611 0.53 5.900000e-01 1.34 1.000000e-02 5.000000e-02 1.800000e-01 6.300000e-01 2.962000e+01 ▇▁▁▁▁

Data Wrangling

We need to convert the listings data into sf object.

listings_sf <- listings %>% 
                st_as_sf(coords = c("longitude", "latitude"),
                         crs = 4326) %>%
                st_transform(crs = 3414)

Display the 10 records and show the geometry type and we can check that projected CRIS is svy21.

head(listings_sf)
Simple feature collection with 6 features and 16 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 25197.84 ymin: 30085.83 xmax: 42209.55 ymax: 36630.01
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 17
  id     name      host_id host_name neighbourhood_group neighbourhood room_type
  <chr>  <chr>     <chr>   <chr>     <chr>               <chr>         <chr>    
1 71609  Villa in… 367042  Belinda   East Region         Tampines      Private …
2 71896  Home in … 367042  Belinda   East Region         Tampines      Private …
3 71903  Home in … 367042  Belinda   East Region         Tampines      Private …
4 275343 Rental u… 1439258 Kay       Central Region      Bukit Merah   Private …
5 275344 Rental u… 1439258 Kay       Central Region      Bukit Merah   Private …
6 289234 Home in … 367042  Belinda   East Region         Tampines      Private …
# ℹ 10 more variables: price <dbl>, minimum_nights <dbl>,
#   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>,
#   number_of_reviews_ltm <dbl>, license <chr>, geometry <POINT [m]>

Show the point details of the geometry column, givin gthe x, y coordianted in SVY21

glimpse(listings_sf)
Rows: 3,457
Columns: 17
$ id                             <chr> "71609", "71896", "71903", "275343", "2…
$ name                           <chr> "Villa in Singapore · ★4.44 · 2 bedroom…
$ host_id                        <chr> "367042", "367042", "367042", "1439258"…
$ host_name                      <chr> "Belinda", "Belinda", "Belinda", "Kay",…
$ neighbourhood_group            <chr> "East Region", "East Region", "East Reg…
$ neighbourhood                  <chr> "Tampines", "Tampines", "Tampines", "Bu…
$ room_type                      <chr> "Private room", "Private room", "Privat…
$ price                          <dbl> 150, 80, 80, 64, 78, 220, 85, 75, 69, 7…
$ minimum_nights                 <dbl> 92, 92, 92, 60, 60, 92, 92, 60, 60, 92,…
$ number_of_reviews              <dbl> 19, 24, 46, 20, 16, 12, 131, 17, 5, 81,…
$ last_review                    <date> 2020-01-17, 2019-10-13, 2020-01-09, 20…
$ reviews_per_month              <dbl> 0.13, 0.16, 0.30, 0.15, 0.11, 0.09, 0.9…
$ calculated_host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51, 51, 7, 7, 1,…
$ availability_365               <dbl> 55, 91, 91, 183, 183, 54, 365, 183, 183…
$ number_of_reviews_ltm          <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2, 0, 0, 0, 0, …
$ license                        <chr> NA, NA, NA, "S0399", "S0399", NA, NA, "…
$ geometry                       <POINT [m]> POINT (41972.5 36390.05), POINT (…

Data Analysis

Type of Accomdation

Room types by region

Summarizing the types of listings by neighbourhood groups

regionlist <- listings %>%
              group_by(neighbourhood_group, room_type) %>%
              summarise(
                num_listings = n(),
                avg_price = mean(price),
                med_price = median(price))

Plotting the type of accomdation by region

ggplot(regionlist, aes(x=room_type, fill = room_type)) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + 
      geom_col(aes(y = num_listings)) +
      facet_grid(cols=vars(neighbourhood_group), margins = T, labeller = labeller(neighbourhood_group = label_wrap_gen(width = 5, multi_line = TRUE))) +
      labs(x = "", y = "No. of listings", fill = "Room Type")

Pricing of Room Types

Remove price outliers

outlier_price <- quantile(listings$price, 0.99, na.rm = TRUE)
listings_cleanprice <- listings %>% filter(price <= outlier_price)

Plot boxplot of prices for each room type

p1 <- ggplot(listings, aes(x=room_type, fill = room_type)) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), legend.position = "None") + 
  geom_boxplot(aes(y=price)) +
  labs(x = "", y = "Listing Price", fill = "Room Type", title = "Listing price")

Cleaned pricing

p2 <- ggplot(listings_cleanprice, aes(x=room_type, fill = room_type)) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + 
  geom_boxplot(aes(y=price)) +
  labs(x = "", y = "Listing Price", fill = "Room Type", title = "Listing price (outliers removed)")

grid.arrange(p1, p2, nrow = 1)

Room Types by Price and Region

ggplot(listings_cleanprice, aes(x=neighbourhood_group, fill = neighbourhood_group)) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
  geom_boxplot(aes(y=price)) +
  facet_grid(~room_type, labeller = labeller(room_type = label_wrap_gen(width = 5, multi_line = TRUE))) +
  labs(x = "", y = "Listing Price", fill = "Region")

Deductive Data Analysis

Hosts & Listings

Host with multiple listings

Create a table of percetnage of hosts by number of listings

list_byhost <- listings %>%
                group_by(host_id, host_name) %>%
                count(name = "number_of_listings", sort = TRUE) %>%
                ungroup() %>%
                group_by(number_of_listings) %>%
                count(name = "number_of_hosts")

Plot above table

ggplot(list_byhost, aes(x=number_of_listings, y= number_of_hosts/sum(number_of_hosts)*100)) +
  geom_point() +
  labs(y="Percentage of hosts", title = "% of hosts vs number of listings", x = "number of listings")

Create Chart

list_byhost <- list_byhost[order(list_byhost$number_of_listings, decreasing = TRUE),]
list_byhost$number_of_listings1 <- factor(list_byhost$number_of_listings, levels = list_byhost$number_of_listings)
list_byhost$listfreq <- list_byhost$number_of_hosts * list_byhost$number_of_listings
list_byhost$cumul <- cumsum(list_byhost$listfreq)
nr <- nrow(list_byhost)
N <- sum(list_byhost$listfreq)
y2 <- c("  0%", " 10%", " 20%", " 30%", " 40%", " 50%", " 60%", " 70%", " 80%", " 90%", "100%")
ggplot(list_byhost, aes(x=number_of_listings1)) +
  geom_bar(aes(y=number_of_hosts), fill = "blue", stat = "identity") +
  geom_point(aes(x=number_of_listings1, y=cumul)) +
  geom_line(aes(x=number_of_listings1, y=cumul)) +
  geom_path(aes(y=cumul, group=1)) +
  labs(y="Frequency", title = "Chart of hosts and listings", x = "No. of listings") +
  theme(plot.margin = margin(c(1,1,1,1), unit="cm"), axis.text.x = element_text(angle=90, vjust=0.6)) +
  annotate("text", x = nr + 3, y = seq(0, N, N/10), label = y2, size = 3.5, hjust = "inward")

Distribution of host type and room type

Create column with single or multiple host types

listings <- listings %>% mutate(host_type = ifelse(calculated_host_listings_count ==1, "Single", "Multiple"))

Plot to visualise

mosaicplot(listings$room_type ~ listings$host_type, color = c("steelblue", "wheat"), xlab = "Room Type", ylab = "Host Type", main = "Plot of Room Type and Host Type")

Most hosts with Single listings mainly offer private rooms, followed by shared rooms. This suggests they’re likely renting out their extra or shared bedrooms to earn extra money. On the other hand, hosts offering entire homes or apartments might own investment properties or be away during the rental period. There are also a few hosts with single hotel room listings, which could be because of mistakes in the room type or because they offer special or boutique accommodations. Usually, hotel operators have multiple listings, so these single hotel room listings might be unique.

#### Analysis of price of room types by single or multiple hosts

# Set seed for reproducibility
# set.seed(123)
# test2 <- listings %>% dplyr::select(room_type, host_type, neighbourhood_group, price)
# grouped_ggbetweenstats(
#   data = test2,
#   x = host_type,
#   y = price,
#   grouping.var = room_type,
#   ggsignif.args = list(textsize = 4, tip_length = 0.01),
#   p.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisons
#   # adding new components to `ggstatsplot` default
#   ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
#   k = 3,
#   title.prefix = "Room Type",
#   palette = "default_jama",
#   package = "ggsci",
#   plotgrid.args = list(nrow = 2),
#   title.text = "Differences in listing prices for single/multiple hosts by different room types"
# )
# grouped_ggbetweenstats(
#   data = test2 %>% filter(room_type != "Hotel room"),
#   x = host_type,
#   y = price,
#   grouping.var = neighbourhood_group,
#   ggsignif.args = list(textsize = 4, tip_length = 0.01),
#   p.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisons
#   # adding new components to `ggstatsplot` default
#   # ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
#   # k = 3,
#   title.prefix = "Room Type",
#   palette = "default_jama",
#   package = "ggsci",
#   # plotgrid.args = list(nrow = 2),
#   title.text = "Differences in listing prices for single/multiple hosts by different neighbourhoods", 
#   output = "subtitle"
# )

Which year did the hosts join airbnb

host_byyear <- d_listings %>% dplyr::select(id, host_id, host_since) %>% group_by(year_joined = year(host_since)) %>% drop_na() %>% summarise(number_hosts = n()) %>% ungroup() %>% mutate(change = (number_hosts - lag(number_hosts)) / lag(number_hosts)*100)

ggplot(host_byyear, aes(x=year_joined, y = number_hosts)) + geom_bar(stat = "identity", fill = "steelblue") + 
   labs(title = "New hosts by year", y = "No. of hosts joined", x = "year") +
   geom_text(aes(label = number_hosts), vjust = -0.3)

Detailed Information on Hosts

How many listings do not have reviews

d_listings %>% filter(., host_acceptance_rate == "0%" | is.na(host_acceptance_rate)) %>% dplyr::select(id, host_acceptance_rate, number_of_reviews) %>% arrange(desc(number_of_reviews))
# A tibble: 97 × 3
   id       host_acceptance_rate number_of_reviews
   <chr>    <chr>                            <dbl>
 1 22051870 0%                                 159
 2 22745179 0%                                 145
 3 32411075 0%                                 141
 4 16502098 0%                                 140
 5 11228389 0%                                  84
 6 37484436 0%                                  69
 7 17684352 0%                                  47
 8 17949590 0%                                  47
 9 7321238  0%                                  34
10 32205289 0%                                  31
# ℹ 87 more rows

Reviews

Listings with no reviews

no_reviews <- listings_sf %>% filter(is.na(last_review))
host_join <- d_listings %>% dplyr::select(id, host_since)
no_reviews_host <- left_join(no_reviews, host_join, by = c("id")) %>% group_by(year_joined = year(host_since)) %>% summarise(number_hosts = n()) %>% drop_na()  %>% ungroup()
no_reviews_host$year_joined = as.character(no_reviews_host$year_joined)
ggplot(no_reviews_host, aes(x=year_joined, y = number_hosts)) + geom_bar(stat = "identity", fill = "steelblue4") + 
   labs(title = "Listings with no reviews - Hosts by year", y = "No. of hosts joined", x = "year") + geom_text(aes(label = number_hosts), vjust = -0.3) + scale_x_discrete(breaks = no_reviews_host$year_joined)

Review score

select relevant data for review scores

review_scores <- d_listings %>% dplyr::select(id, host_id, number_of_reviews, room_type, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_communication, review_scores_checkin, review_scores_location, review_scores_value, reviews_per_month)

Plot histogram of overall rating

ggplot(review_scores, aes(x=review_scores_rating)) + geom_histogram(aes(fill = room_type), bins = 20) + stat_bin(aes(label = ..count..), bins = 20, size = 3, geom= "text", vjust = -1)

Dsitribution of overall review score

Plot histogram of attribute score

ggplot(gather(review_scores[, -c(1:5,12)], cols, value), aes(x = value)) + 
       geom_histogram(binwidth = 1) + facet_grid(.~cols)

Spatial Distribution of Airbnb listings in Singapore

Handling Spatial Data Outliers

Identify listings that fall within the Water Cachement Area

filter(listings, neighbourhood == "Central Water Catchment")
# A tibble: 5 × 19
  id          name  host_id host_name neighbourhood_group neighbourhood latitude
  <chr>       <chr> <chr>   <chr>     <chr>               <chr>            <dbl>
1 36009935    Serv… 238891… Neha      North Region        Central Wate…     1.35
2 36852543    Cond… 156409… Tushita   North Region        Central Wate…     1.35
3 43424855    Plac… 346107… Chris     North Region        Central Wate…     1.35
4 46669109    Rent… 238891… Neha      North Region        Central Wate…     1.35
5 9148448679… Rent… 444964… Richard   North Region        Central Wate…     1.36
# ℹ 12 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>,
#   host_type <chr>

Remove listings in the Central Water Catchment, Sungei Kadut and Mandai areas

listings_clean <- filter(listings_sf, !neighbourhood %in% c("Central Water Catchment", "Sungei Kadut", "Mandai", "Western Water Catchment")) %>% st_as_sf()

Mapping Airbnb listings in Singapore

Loading basemap raster with bounding box

Read in OSM raster of listings data for plot view and create bounding box

sg_osm <- tmaptools::read_osm(listings_clean, ext=1.3)
bb_sg_osm <- st_bbox(listings_clean, crs = 3414)

Listing by room types and neighbourhood

Plotting neighbourhood listings on tmap

st_is_valid(nhood_map_sf)
 [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
[13]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
[25]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
[37]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE
[49]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
nhood_map_sf <- st_make_valid(nhood_map_sf)
st_is_valid(nhood_map_sf)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
class(nhood_map_sf)
[1] "sf"         "data.frame"
nhood_map_sf <- na.omit(nhood_map_sf)

Deleting cos after analyzing realise the geometries is invalid

nhood_map_sf <- nhood_map_sf[-c(28, 45), ]
tmap_mode("view")

# Plotting points
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(nhood_map_sf) +
  tm_polygons(alpha = 0.3) +
tm_shape(listings_clean) +
  tm_symbols(col="room_type", size = 0.2) +
  tm_view(set.zoom.limits = c(11, 17)) +
  tm_facets(by="room_type") +
  tm_layout(legend.show = F)

Rental prices by room type

Removing price outliers

listings_sf_price <- listings_clean %>% filter(price <= outlier_price)
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
  tm_rgb() +
tm_shape(nhood_map_sf) +
  tm_polygons(alpha = 0.3) +
  tm_shape(listings_sf_price) +
  tm_symbols(col = "price", size = 0.2, palette = "YlOrBr", legend.hist = TRUE) +
  # tm_view(set.zoom.limits = c(11, 18)) +
  tm_facets(by="room_type") +
  tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.stack = "horizontal", legend.hist.height = 1, legend.hist.width = 0.85, legend.outside.size=0.1)

Rental prices by room type and host type

listings_sf_price <- listings_sf_price %>% mutate(host_type = ifelse(calculated_host_listings_count ==1, "Single", "Multiple"))
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
  tm_rgb() +
tm_shape(nhood_map_sf) +
  tm_polygons(alpha = 0.3) +
  tm_shape(listings_sf_price) +
  tm_symbols(col = "host_type", shape = "price", size = 0.2, title.col = "Host Type", title.shape = "Price") +
  tm_facets(by="room_type")+
  tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.stack = "horizontal", legend.outside.size=0.1)

Chlorpleth map of listings and median

Summary of cleaned price listings by neighbourhood and room type

# Create a summary of the cleaned price listings by neighbourhood and room type (number of listings and median price)
listings_cleanprice_sum <- st_drop_geometry(listings_sf_price) %>%
  group_by(neighbourhood, room_type) %>%
  summarise(num_listings = n(), med_price = median(price), .groups = "keep") %>%
  arrange(desc(num_listings))

listings_join <- left_join(nhood_map_sf, listings_cleanprice_sum, by = c("neighbourhood"))
#take out all the NA values data set
listings_join <- na.omit(listings_join)
tmap_mode("plot")
tmap_arrange(
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
  tm_rgb() +
  tm_shape(listings_join) +
   tm_polygons("med_price", title = "Median Price") +
   tm_view(set.zoom.limits = c(10, 18)) + 
   tm_facets(by="room_type", drop.NA.facets = T) +
   tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.outside.size = 0.2),
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
  tm_rgb() +
   tm_shape(listings_join) +
   tm_polygons("num_listings", title = "No. of listings", palette = "Blues", alpha = 0.6) +
   tm_view(set.zoom.limits = c(10, 18)) + 
   tm_facets(by="room_type", drop.NA.facets = T) +
   tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.outside.size = 0.2)
)

Data that will be used for wrangling subzone map

listings to be converted from the listings_sf dataframe to SVY21 format (otherwise it will be out of bounds when using Spatstat) and converting to SpatialPoints (sp) #Remove listings from neighbourhoods that are not zoned for residential

listings_clean <- st_transform(listings_clean, 3414)
nonreslisting <- c("Tuas", "Pioneer")
listings_clean <- listings_clean %>% filter(!neighbourhood %in% nonreslisting)
listings_sp <- listings_clean %>% dplyr::select(geometry, room_type) %>% as(., Class = "Spatial")
summary(listings_sp)
Object of class SpatialPointsDataFrame
Coordinates:
               min      max
coords.x1 11133.06 45389.01
coords.x2 22713.81 48821.86
Is projected: TRUE 
proj4string :
[+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
+x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0
+units=m +no_defs]
Number of points: 3443
Data attributes:
  room_type        
 Length:3443       
 Class :character  
 Mode  :character  
plot(listings_sp)

Wrangling subzone

  • MP14_SUBZONE_WEB_PL.shp
mpsz_sf <- st_read(dsn = "data/geospatial",
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\Feliciaeng29\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
sg_osm <- read_osm(listings_clean, ext=1.3)

# Ensure that geometry is valid and check crs of subzone map
mpsz_sf <- st_make_valid(mpsz_sf)
all(st_is_valid(mpsz_sf))
[1] TRUE
crs(mpsz_sf)
[1] "PROJCRS[\"SVY21\",\n    BASEGEOGCRS[\"SVY21[WGS84]\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]],\n            ID[\"EPSG\",6326]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"Degree\",0.0174532925199433]]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Transverse Mercator\",\n            ID[\"EPSG\",9807]],\n        PARAMETER[\"Latitude of natural origin\",1.36666666666667,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",103.833333333333,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"Scale factor at natural origin\",1,\n            SCALEUNIT[\"unity\",1],\n            ID[\"EPSG\",8805]],\n        PARAMETER[\"False easting\",28001.642,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",38744.572,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"
mpsz_sf2 <- mpsz_sf %>% dplyr::select(SUBZONE_N, REGION_N)
mpsz_sf2 <- st_transform(mpsz_sf2, st_crs(listings_clean))
listings_sf2 <- listings_clean %>% dplyr::select(neighbourhood, room_type)
listings_subzones <- st_join(listings_sf2, mpsz_sf2, prepared=TRUE, join=st_within)
subzone_list <- unique(listings_subzones$SUBZONE_N)
mpsz_sf3 <- mpsz_sf %>% filter(SUBZONE_N %in% subzone_list)
plot(mpsz_sf3["SUBZONE_N"])

Splitting data into subregions

region_list <- unique(listings_subzones$REGION_N)

# Create a store of region SF
region_store_sf <- list()

# Create list of sub-regions
for (i in seq_along(region_list)) {
  region_name <- paste(tolower(word(region_list[i],1)))
  region_name <- gsub("-", "", region_name)
  var_name <- paste("sf", region_name, sep="_")
  region_store_sf[[region_name]] <- mpsz_sf3[mpsz_sf3$REGION_N == region_list[i],c("SUBZONE_N", "PLN_AREA_N")]
}

plot(region_store_sf$northeast)

# Convert to owin store, with subzone boundaries dissolved
region_store_owin <- list()

for (i in seq_along(region_list)) {
  region_name <- paste(tolower(word(region_list[i],1))) %>% gsub("-", "", .)
  var_name <- paste("sf", region_name, sep="_")
  region_store_owin[[region_name]] <- mpsz_sf3[mpsz_sf3$REGION_N == region_list[i],3] %>% st_union(.) %>% as(., Class = "Spatial") %>% as(., "owin")
}

# Plot owin to check
plot(region_store_owin$northeast)

# create ppp in owin objects 
listings_ppp <- as.ppp(listings_sp)
marks(listings_ppp) <- factor(listings_clean$room_type)
listing_ppp_jit <- rjitter(listings_ppp, retry = TRUE, nsim = 1, drop = TRUE)
# Create ppp in owin objects split by sub-regions, rescaled to km
ppp_east <- listing_ppp_jit[region_store_owin$east] %>% rescale(., 1000, "km")
ppp_central <- listing_ppp_jit[region_store_owin$central] %>% rescale(., 1000, "km")
ppp_west <- listing_ppp_jit[region_store_owin$west] %>% rescale(., 1000, "km")
ppp_north <- listing_ppp_jit[region_store_owin$north] %>% rescale(., 1000, "km")
ppp_northeast <- listing_ppp_jit[region_store_owin$northeast] %>% rescale(., 1000, "km")

# create object store split by region, then by room type of form ppp_store$region$roomtype
types_room <- unique(listings_sf$room_type)
ppp_store4 <- list()
for (i in seq_along(region_list)) {
  region_name <- paste(tolower(word(region_list[i],1))) %>% gsub("-", "", .)
  ppp_store4[[region_name]] <- list()
  for (j in seq_along(types_room)) {
    room_name <- paste(tolower(word(types_room[j],1)))
    var_name <- paste("ppp", region_name, sep="_")
    ppp_store4[[region_name]][["all"]] <- eval(as.name(var_name))
    ppp_store4[[region_name]][[room_name]] <- ppp_store4[[region_name]][["all"]] [ppp_store4[[region_name]][["all"]][["marks"]]==types_room[j]]
  }
}

Second Order Analysis

K Function test for regions

K-function test for East Region

Ktest_east_hotel <- Kest(ppp_store4$east$hotel, correction = "Ripley")
Ktest_east_hotel.csr <- envelope(ppp_store4$east$hotel, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_east_entire <- Kest(ppp_store4$east$entire, correction = "Ripley")
Ktest_east_entire.csr <- envelope(ppp_store4$east$entire, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_east_private <- Kest(ppp_store4$east$private, correction = "Ripley")
Ktest_east_private.csr <- envelope(ppp_store4$east$private, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_east_shared <- Kest(ppp_store4$east$shared, correction = "Ripley")
Ktest_east_shared.csr <- envelope(ppp_store4$east$shared, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktestfiles = as.list(dir(pattern="Ktest__*"))
lapply(Ktestfiles,load,.GlobalEnv)
list()

East Region

summary(ppp_store4$east$all$marks)
Entire home/apt      Hotel room    Private room     Shared room 
             41               0             149               5 

Plot K-function test results for East Region

par(mfrow=c(2,2))
plot(Ktest_east_hotel.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_east_entire.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_east_private.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_east_shared.csr, .-r~r, xlab="d", ylab="K(d)-r")

West Region

Ktest_west_hotel <- Kest(ppp_store4$west$hotel, correction = "Ripley")
Ktest_west_hotel.csr <- envelope(ppp_store4$west$hotel, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_west_entire <- Kest(ppp_store4$west$entire, correction = "Ripley")
Ktest_west_entire.csr <- envelope(ppp_store4$west$entire, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_west_private <- Kest(ppp_store4$west$private, correction = "Ripley")
Ktest_west_private.csr <- envelope(ppp_store4$west$private, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_west_shared <- Kest(ppp_store4$west$shared, correction = "Ripley")
Ktest_west_shared.csr <- envelope(ppp_store4$west$shared, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktestfiles = as.list(dir(pattern="Ktest__*"))
lapply(Ktestfiles,load,.GlobalEnv)
list()
summary(ppp_store4$west$all$marks)
Entire home/apt      Hotel room    Private room     Shared room 
            104               0             132               8 

Plot K-function test results for West Region

par(mfrow=c(2,2))
plot(Ktest_west_hotel.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_west_entire.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_west_private.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_west_shared.csr, .-r~r, xlab="d", ylab="K(d)-r")

North Region

Ktest_north_hotel <- Kest(ppp_store4$north$hotel, correction = "Ripley")
Ktest_north_hotel.csr <- envelope(ppp_store4$north$hotel, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_north_entire <- Kest(ppp_store4$north$entire, correction = "Ripley")
Ktest_north_entire.csr <- envelope(ppp_store4$north$entire, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_north_private <- Kest(ppp_store4$north$private, correction = "Ripley")
Ktest_north_private.csr <- envelope(ppp_store4$north$private, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_north_shared <- Kest(ppp_store4$north$shared, correction = "Ripley")
Ktest_north_shared.csr <- envelope(ppp_store4$north$shared, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
summary(ppp_store4$north$all$marks)
Entire home/apt      Hotel room    Private room     Shared room 
              6               0              73               0 
par(mfrow=c(2,2))
plot(Ktest_north_hotel.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_north_entire.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_north_private.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_north_shared.csr, .-r~r, xlab="d", ylab="K(d)-r")

Northeast Region

Ktest_northeast_hotel <- Kest(ppp_store4$northeast$hotel, correction = "Ripley")
Ktest_northeast_hotel.csr <- envelope(ppp_store4$northeast$hotel, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_northeast_entire <- Kest(ppp_store4$northeast$entire, correction = "Ripley")
Ktest_northeast_entire.csr <- envelope(ppp_store4$northeast$entire, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_northeast_private <- Kest(ppp_store4$northeast$private, correction = "Ripley")
Ktest_northeast_private.csr <- envelope(ppp_store4$northeast$private, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
Ktest_northeast_shared <- Kest(ppp_store4$northeast$shared, correction = "Ripley")
Ktest_northeast_shared.csr <- envelope(ppp_store4$northeast$shared, Kest, nsim=99, rank=1, glocal=TRUE)
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99.

Done.
summary(ppp_store4$northeast$all$marks)
Entire home/apt      Hotel room    Private room     Shared room 
             25               0             108               2 
par(mfrow=c(2,2))
plot(Ktest_northeast_hotel.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_northeast_entire.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_northeast_private.csr, .-r~r, xlab="d", ylab="K(d)-r")
plot(Ktest_northeast_shared.csr, .-r~r, xlab="d", ylab="K(d)-r")

K test using fourier transform

Create ppp for all of Singapore with only the subzones containing listings and dissolve the boundaries

sg_owin <- mpsz_sf3 %>% st_union(.) %>% as(., Class="Spatial") %>% as(., "owin")  
ppp_all <- listing_ppp_jit[sg_owin] %>% rescale(., 1000, "km")
plot(split(ppp_all))

Create ppp store of the 4 different room types

types_room <- unique(listings_sf$room_type)
ppp_store <- list()

for (i in seq_along(types_room)) {
  room_name <- paste(tolower(word(types_room[i],1)))
  var_name <- paste("ppp", room_name, sep="_")
  ppp_store[[room_name]] <- ppp_all[ppp_all$marks == types_room[i]]
}

Calculate the sigma to pass into FFT function. Create sigma using the bw.diggle() function for the various room types.

sigma_hotel <- bw.diggle(ppp_store$hotel)
sigma_shared <- bw.diggle(ppp_store$shared)
sigma_private <- bw.diggle(ppp_store$private)
sigma_entire <- bw.diggle(ppp_store$entire)

Hotel Rooms

K-test using FFT - Hotel Room

ptm <- proc.time()
set.seed(123)
Kfft_hotel <- Kest.fft(ppp_store$hotel, sigma_hotel)
Kfft_hotel.csr <- envelope(ppp_store$hotel, Kest.fft, sigma = sigma_hotel, nsim = 300, rank = 1, global=TRUE)
Generating 300 simulations of CSR  ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.

Done.
proc.time() - ptm
   user  system elapsed 
  21.21    1.22   38.49 
par(mfrow=c(1,2))
plot(Kfft_hotel.csr, . - r ~ r, xlab="d", ylab="K(d)-r")
plot(Kfft_hotel.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim=range(0,1))

Shared rooms

K-test using FFT for shared room

set.seed(123)
ptm <- proc.time()
Kfft_shared <- Kest.fft(ppp_store$shared, sigma_shared)
Kfft_shared.csr <- envelope(ppp_store$shared, Kest.fft, sigma = sigma_shared, nsim = 300, rank = 1, global=TRUE)
Generating 300 simulations of CSR  ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.

Done.
proc.time() - ptm
   user  system elapsed 
  19.92    1.08   31.46 
par(mfrow=c(1,2))
plot(Kfft_shared.csr, . -r ~r, xlab="d", ylab="K(d)-r")
plot(Kfft_shared.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim = range(0,1))

Private rooms

ptm <- proc.time()
Kfft_private <- Kest.fft(ppp_store$private, sigma_private)
Kfft_private.csr <- envelope(ppp_store$private, Kest.fft, sigma = sigma_private, nsim = 300, rank = 1, global=TRUE)
Generating 300 simulations of CSR  ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.

Done.
proc.time() - ptm
   user  system elapsed 
  59.63    4.16   93.27 
par(mfrow=c(1,2))
plot(Kfft_private.csr, . -r ~r, xlab="d", ylab="K(d)-r")
plot(Kfft_private.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim = range(0,1))

Entire home/ apartments

ptm <- proc.time()
Kfft_entire <- Kest.fft(ppp_store$entire, sigma_entire)
Kfft_entire.csr <- envelope(ppp_store$entire, Kest.fft, sigma = sigma_entire, nsim = 300, rank = 1, global=TRUE)
Generating 300 simulations of CSR  ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.

Done.
proc.time() - ptm
   user  system elapsed 
  56.27    5.38   89.42 
par(mfrow=c(1,2))
plot(Kfft_entire.csr, . -r ~r, xlab="d", ylab="K(d)-r")
plot(Kfft_entire.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim = range(0,1))

Summary of Cluster Radius

Save clustering distance for density plot

kdist <- list("private" = 0.4, "entire" = 0.3, "hotel" = 0.25, "shared"=0.6)

First Order Anaylsis (Kernel Density Estimation)

# Function maps the ppp (in km) with either an adaptive bandwidth ("adaptive) or the specified bandwidth to be used ("bw.diggle, bw.ppl, etc) 
kdemap <- function(ppp, y, polyshape, gtitle) {
  # ppp.km <- rescale(ppp, 1000, "km")
# Computing kernel density estimation using specified bandwidth
    if (y == "adaptive") {
      kde_ppp <- adaptive.density(ppp, method = "kernel")
    } else if (is.numeric(y)) {
      kde_ppp <- density(ppp, sigma = y, edge = TRUE, kernel = "gaussian")
      }
      else {
      kde_ppp <- density(ppp, sigma = eval(as.name(y)), edge = TRUE, kernel = "gaussian")
    }
  rastmap <- as.SpatialGridDataFrame.im(kde_ppp) %>% raster(.)
  projection(rastmap) <- CRS("+init=EPSG:3414 +units=km")
  rastmap1 <- disaggregate(rastmap, fact=3) # increase resolution of the raster map by factor of 3 to approximately 50m 

  tm_shape(sg_osm) +
    tm_rgb() +
  tm_shape(polyshape) +
    # tm_text("SUBZONE_N", size = 0.6) +
    tm_polygons(alpha = 0) +
  tm_shape(rastmap1) +
    tm_raster("v", alpha = 0.7, palette = "YlOrBr") +
    tm_layout(legend.position = c("right", "bottom"), frame = FALSE, main.title = gtitle, main.title.position="center", main.title.size = 1.25) +
    tm_view(set.zoom.limits = c(11, 17))
}

KDE for different room types

Private rooms

kdemap(ppp_store$private, kdist$private, mpsz_sf3[,3], "KDE private rooms")

Entire home/ apartments

tmap_mode("view")
kdemap(ppp_store$entire, kdist$entire, mpsz_sf3[,3], "KDE entire homes/apt")

Shared rooms

kdemap(ppp_store$shared, kdist$shared, mpsz_sf3[,3], "KDE shared rooms")

Hotel rooms

kdemap(ppp_store$hotel, kdist$hotel, mpsz_sf3[,3], "KDE hotel rooms")

Using adaptive bandwidth

kdemap(ppp_store$private, "adaptive", mpsz_sf3[,3], "KDE private rooms adaptive bw")
tmap_mode("view")
kdemap(ppp_store$entire, "adaptive", mpsz_sf3[,3], "KDE entire homes/apt adaptive bw")

UI for Proposed Idea

Kernel Density Estimation

Second Order Analysis & Chloropleth Map